Measures of Clustering Quality: A Working Set of Axioms for Clustering
نویسندگان
چکیده
Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg’s axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg’s axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time.
منابع مشابه
Clustering Quality Measures
Aiming towards the development of a general clustering theory, addressing issues that are common to the different clustering paradigms, we wish to initiate a systematic study of measures for the quality of a given data clustering. A clustering quality measure is a function that, given a data set and its partition into clusters, returns a non-negative real number representing the quality of that...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملGROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION
Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کامل